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Abstract. A major goal of helioseismology is the three-dimensional reconstruction 
of the three velocity components of convective flows in the solar interior from sets 
of wave travel-time measurements. For small amplitude flows, the forward problem 
is described in good approximation by a large system of convolution equations. The 
input observations are highly noisy random vectors with a known dense covariance 
matrix. This leads to a large statistical linear inverse problem. 

Whereas for deterministic linear inverse problems several computationally efficient 
minimax optimal regularization methods exist, only one minimax-optimal linear 
estimator exists for statistical linear inverse problems: the Pinsker estimator. However, 
it is often computationally inefficient because it requires a singular value decomposition 
of the forward operator or it is not applicable because of an unknown noise covariance 
matrix, so it is rarely used for real-world problems. These limitations do not apply 
in helioseismology. We present a simplified proof of the optimality properties of the 
Pinsker estimator and show that it yields significantly better reconstructions than 
traditional inversion methods used in helioseismology, i.e. Regularized Least Squares 
(Tikhonov regularization) and SOLA (approximate inverse) methods. 

Moreover, we discuss the incorporation of the mass conservation constraint in the 
Pinsker scheme using staggered grids. With this improvement we can reconstruct 
not only horizontal, but also vertical velocity components that are much smaller in 
amplitude. 


1. Introduction 

Time-distance helioseismology aims at recovering the internal properties of the Sun from 
measurements of wave travel times between pairs of points [T2] . The raw observations 
in helioseismology are time sequences of images of the line-of-sight velocity on the 
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solar surface via Doppler shift measurements, for example from the Solar Dynamics 
Observatory (45 s cadence since 2010). These Doppler velocities contain information 
about the stochastic seismic wave held (acoustic waves and surface-gravity waves). Using 
a cross-correlation technique Duvall et al. ra showed that it is possible to measure the 
time it takes a wave packet to travel between any two points on the surface through the 
solar interior. The wave travel times are linked to (perturbations of) physical quantities 
via a large system of convolution equations. In this paper we focus on the estimation of 
hows. The inversion is traditionally performed using Tikhonov regularization [35] or the 
method of approximate inverse EBUE] that are respectively called in the helioseismology 
community, Regularized Least Square (RLS) [23] and (Subtractive) Optimally Localized 
Averaging (OLA/SOLA) [21J. The latter goes back to the Backus-Gilbert method [1] 
and, as pointed out by Chavent [Qj, it is also closely related to the method of sentinels 
introduced by J.L. Lions for control problems (see m- 

For overviews on linear statistical inverse problems we refer to m m m- 
Optimal rates of convergence for spectral regularization methods, in particular Tikhonov 
regularization, were shown in [3], and for the CG method in |4J. Pinsker-type estimator 
for deconvolution problems on the real line were studied theoretically in different degrees 
of generality in a series of papers by Ermakov (see e.g. mm)- The case of periodic 
deconvolution problems with noise in the operator was treated in [8J. A minimax 
estimator for spherical deconvolution over a reduced class of estimators was developed 
in [221- 

For linear inverse problems in Hilbert spaces with additive random noise Pinsker 
estimators are optimal in the following sense: For a given ellipsoid spanned by singular 
vectors of the forward operator, the Pinsker estimator minimizes the maximal risk (or 
expected square error) over this ellipsoid among all linear estimators. We point out that 
for deterministic inverse problems typically many optimal methods exist, e.g. Tikhonov 
regularization, some types of singular value decompositions, the Showalter methods 
and (asymptotically) Landweber iteration and Lardy’s methods, each of course with 
an optimal choice of the regularization parameter or stopping index (see (HU EH!). i n 
contrast, for statistical inverse problems, the Pinsker method is the only minimax linear 
estimator ([26) 30]). Moreover, it was shown by Pinsker [30] under mild assumptions 
that it is even asymptotically optimal among all (not necessarily linear) estimators if the 
noise is Gaussian. In most real world applications this estimator cannot be applied for 
two main reasons: First, it requires the computation of a Singular Value Decomposition 
(SVD) of the forward operator which is often not affordable due to the size of the 
problem. Second, the noise covariance matrix has to be known while only a poor 
estimate is generally available. This explains why other methods such as Tikhonov 
regularization or Conjugate Gradient methods are more often used for real world 
applications. However, these limitations are not problematic for the helioseismology 
problem studied here since the forward operator separates into a collection of small 
matrices for each spatial frequencies, for which an SVD can be computed in reasonable 
time, and the noise covariance matrix is known uadi!. In this paper we will study the 
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implementation and performance of Pinsker estimators for such problems. 

A notorious difficulty in local helioseismology is the inversion for vertical velocity 
components as their amplitude is much smaller than for horizontal velocities. The 
failure of inversion was reported in several publications using synthetic data (see e.g. 
mm and was explained by the crosstalk between the variables. Here, we show 
that incorporation of the mass conservation constraint in the Tikhonov or Pinsker 
methods allows to overcome these difficulties. We will discuss the implementation of 
mass conservation constraints with the help of staggered grids for the horizontal and 
the vertical velocity components. 

The plan of this paper is as follows: After introducing the physical background and 
the forward problem in Section [2| we describe in Section [3] the inversion methods that are 
commonly used in this field so far. Then we introduce the Pinsker estimator in Section[4] 
and present a simple proof that it is the unique minimax linear estimator. Section [5] is 
devoted to the incorporation of the mass conservation constraint into this regularization 
scheme. Finally, numerical results demonstrating the advantages of Pinsker methods 
compared to the state-of-the-art methods are discussed in Section [6} 

2. Estimating flows by local helioseismology 

In local helioseismology, it is acceptable to consider small patches of the solar surface 
and to neglect solar curvature. The domain of interest is approximated by a Cartesian 
box, with horizontal coordinates r = (x, y ) and vertical coordinate (height) z. Let us 
denote this domain by V. Typically, x and y span several hundreds of megameters and 
z several tens of megameters. 

The observables are time series of the line-of-sight velocities ip(r,tj) at different 
points r obtained from dopplergrams of the Sun’s surface taken by satellites at 
equidistant time points tj. From these quantities, we compute averaged travel times 
f a (r) at different points r (and at time to, but we assume the time series ip to be 
stationary) of the form 

f a (r) f Cov(ip(r, t 0 ),ip( r + r, t 0 + t j ))w a ( r, tj) dr, a = 1,..., N a . 

j 

(We reserve the symbol r° for differences of f a to a reference model.) The weights w a are 
chosen such that f a (r) approximates a spatial average of the times a certain type of wave 
packet needs to travel from point r to points r + r, see mm and the end of this section 
for more details. Hence, what will be called travel times in the following are linear 
functionals of the covariance operator of the observable ip, written as f a = W a (Cov[^]) 
or f = W(Cov[V>]) for the vector t = (f a ) a=1 .. Na of all travel times. 

The observable ip is the image of the wave displacement £ = £(r ,z,t) under an 
observation operator T, i.e. ip = T£. Ideally, ip(r,t) = l(r,t) ■ £(r, 0,t) with the unit- 
length line-of-sight vector l(r,t), but in practice T also involves the point spread function 
of the instrument. The wave displacement is linked to internal properties of the Sun via 
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a PDE describing the wave propagation in the Sun [6]: 

££ := p(d t + T + v ■ V) 2 £ — V (pc 2 V • £) + V (£ • VP) + V • (pg£) = f, 

where p is the density, c the sound speed, P the pressure, g the gravitational acceleration, 

T the damping, v the flow, and / a (stochastic) source term responsible for the excitation 
of the seismic waves. Additional terms can be included to take into account the effects 
of rotation, magnetic field or a more complex form of the gravitational term. 

Our aim is to recover the 3D flow velocity field v = (v x , v y ,v z ) from observed travel 
times f. Inversion for other physical quantities can be performed analogously. We point 
out that actually computations are performed in the frequency domain, but at least 
formally we can write the forward operator as F(v) = W(T£[v] _1 Cov[f](£[v] _1 )*7 - *), 
so we have to solve the nonlinear operator equation f = F(v) + n where n denotes 
noise. Under the assumption that v is small compared to the local wave speed, which 
is true at least in quiet parts of the Sun, the Born approximation F(v) ~ F( 0) + F'[0]v 
is sufficiently accurate ra. and we obtain the linear operator equation F' [0]v = r + n 
with r := f — F(0). The operator F'[0] can be written as an integral operator, and due 
to horizontal translation invariance the Schwartz kernel K of U'[0] only depends on the 
difference r — r ; , so 

r a (r) = f ^ K a;l3 (r'—r,z)v^(r',z)d 2 r , dz+n a (r), a = 1 ,. .., AT a (l) 

V /3e{x,y,z} 

(see [IB]). The functions K aui are known as sensitivity kernels, but in contrast to the 
convention used in helioseismology where K a '^{jc' — r ,z) is replaced by W a;/3 (r' + r ,z), 
we use a standard convolution integral as it is mathematically more convenient. The 
assumption that the kernels are invariant under horizontal translation is intimately 
connected to the assumption that we are modeling only a small patch on the solar 
surface. 

Due to mass conservation the flow velocity satisfies the equation 

div (pv) = 0, (2) 

where the mass density p is assumed to depend on z only. Note that this constraint 
reduces the effective number of unknowns of the inverse problem by about one third. 

Besides the Born approximation we will use two further simplifying assumptions: 
The first approximation consists in imposing periodic boundary conditions in the 
horizontal variables. Since the kernels are localized, aliasing artifacts can be avoided by 
zero-padding, but, nevertheless, this approximation leads to a loss of information close 
to the boundaries. We may assume without further loss of generality that the periodicity 
cell is [—7r, 7r] 2 in dimensionless coordinates. The second approximation consists in a 
discrete treatment of the depth variable z. For simplicity, we assume that the ?U(r, •) is 
represented by its values on a grid {zi ,..., z^ z } and define v^ ,Zj (r) := tU(r, Zj). 

Then, ([!]) can be written as 

N z 

r a (r ) = ^ * v p * ) (r) + n a (r), 

j = 1 /3£{x,y,z} 


a — 1,..., A^ a (3) 
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where * denotes periodic convolution. Denoting by 

/ 7r rn 

/ /(r) exp(— ir • k) dxdy , k G Z 2 , 

-7T J — 7T 

the Fourier coefficients of a periodic function / : (M/Z) 2 —>■ C, we can 
equivalently in Fourier space as 

N z 


E E K P‘‘ v 


k 


+ n\ 


k ■ 


a = 1,..., iV a ,k G Z 2 . 


3 = 1 /3e{x,y,z) 


write ([3]) 


(4) 


The problem is now decoupled for each spatial frequency k and can be written in a 
matrix form as 


Tk = AT k v k + n k , k G Z 2 (5) 

where the quantities we want to recover have been reorganized in the column vectors 
v k = {v k Z3 )p, Zj C 3JVz , the observables are r k = (r£) a G C jV °, and the Fourier 
transformed convolution kernels are K k = (K ^ ,Zj ) G C NaX3Nz . 

The noise is assumed to be translation invariant, so the noise covariance matrix, 

A a& (d) = Cov[n a (r), n b {r + d)], a, b = 1, ..., N a 

does not depend on r. As a consequence, noise vectors n k ,n k / for different spatial 
frequencies k, k' G Z 2 are uncorrelated, and the covariance matrix of n k is given by 
A k = (A £ b ) a b G C NaXNa . An expression for these matrices was first derived in [H] and 
generalized in na taking into account that the observation time is finite. 

For our computations we will use the kernels K from [32], which we are going 
to describe briefly. We consider a Cartesian patch of the solar surface containing 200 
x 200 pixels with a spatial sampling width of h x = 1.46 Mm. The vertical direction 
z is discretized with N z = 89 points using a variable step size as the variations are 
stronger close to the surface due to the density profile. This variation of the mass 
density of several orders of magnitude near the surface is one of the difficulties to invert 
for velocities. The quantity v = (v x ,v y ,v z ) we want to recover has thus 3 N z = 267 
degrees of freedom for each spatial frequency k. 

In order to improve the signal-to-noise ratio, certain averages of point-to-point 
travel times are used, for example between the center of a disk and all the points 
located at a given radius of this disk. Such types of data are sensitive to in/out flows 
in this disk. Imposing other weights on the circle leads to data that are sensitive to 
East-West or North-South flows. Varying the center of this disk on the whole surface 
of the observational domain allow to build a map of observations. We use each of these 
three averaging schemes for 16 radii from 5 Mm to 20 Mm. Moreover, we use filters for 
f, pi, p2, p3, and p4 waves. (The first one is a gravitational wave, and the latters are 
acoustic waves with 1,2,3 or 4 nodes.) This yields N a = 3 x 16 x 5 = 240 travel time 
data for each of the 200 2 points on the solar surface. Thus, the kernels K k are of the 
size 240 x 267 for each of the 200 2 frequencies k. 

To provide some intuition for the problem we are solving, a representation of kernels 
for v x and v z using different filters is given in Figure [l] The right column represents 
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Figure 1. 6 out of 240 sensitivity kernels describing the forward operator. Columns 
1 and 3: cuts at y = 0; columns 2 and 4: cuts at 2 = 0. The different rows show 
different wave filters, from top to bottom: f, pi and p3. To facilitate the comparison 
the amplitude of pi (resp. p3) kernels have been multiplied by 2 (resp. 6). The largest 
the radial order the deepest is the sensitivity. The columns 1 and 2 show East-West 
averaging schemes that are designed to be sensitive to v x velocities, and columns 3 and 
4 are in-out averaging sensitive v z velocities. 


cuts at z = 0 (surface of the Sun) for the part sensitive to v x (top) and v z (middle). 
The kernels are localized around the center indicating that the data are relatively close 
to the quantities we want to infer for. The bottom plot shows the cross-talk betwween 
v z and v x i.e. how the data sensitive to v z are related to the ones for v x . One can see 
that the amplitude is around ten times smaller than the one of the main kernel (top) 
and that the integral of this kernel is zero indicating that the average value of v z is not 
influenced by v x . The left and right columns of Figure [l] show the depth dependance of 
the kernels for different type of waves. One can see the importance of using different 
waves in order to probe different depths in the solar interior. However, all kernels are 
extremely sensitive to the surface making inversion at large depths highly ill-posed. 

An example of travel time map for a given filter is given in Figure [2] before adding 
the noise and after. The noise level corresponds to data averaged over 4 days with a 
temporal sampling of 45 seconds. Even with such a long averaging time, one can see 
that the noise is highly correlated, which underlines the importance of a good knowledge 
of the noise covariance matrix as computed in mm- 


3. Classical inversion methods used in local helioseismology 


3.1. Regularized Least Squares (RLS) 


Tikhonov regularization is generally called Regularized Least Squares (RLS) in the 
helioseismology community. Since the problem decouples for all k ([21]), we can compute 


v, 


•RLS_ 


argmm, 


Vk 


A k 2 (iF k n k - r k ) + a ||L k n k ||" 


= (K*A~ 1 K k + aL* k L k ) 1 K£ A k V k 


( 6 ) 


independently for all spatial frequencies k G Z 2 . Here a > 0 is the regularization 
parameter, and L k is a regularization matrix that can be the identity or the discretized 
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Figure 2. Noisy synthetic travel times for a supergranule velocity model from Ref. 
m- The top label indicates the filter and the factor by which the data are multiplied 
(e.g. times 2 for pi mode). The deeper the waves are travelling, the noisier the 
observations. 


version of the gradient or the Laplacian in order to impose additional smoothness on 
the solution. 

3.2. Optimally Localized Averaging (OLA) 

Different types of Optimally Localized Averaging (OLA) methods are used in 
helioseismology. Recently, it was proposed to take advantage of the convolution in 
the horizontal space and to propose a multichannel OLA [23.J. Similar to the previous 
approach the problem decouples for all frequencies and can be solved efficiently. We seek 
for a linear combination of travel times via weighting matrices W k = (W k ZjM ) e C 3iY ~ 7 N “ 
(the Fourier coefficients of weighting kernels W( r) := Il k exp(ir • k) with values 

in M . 3NzXNa ) such that 

v k = IVkTk, k e z 2 (7) 

is a good estimate of v k . 

Note from the second line in a that RLS is also of this form with 1F^ LS = 
(K k k k l K k + aL k L k )~ l K k A. k l . Inserting (JI]) into 0 yields 

r>k = W k K k v k + WW (8) 

Definition 3.1. For a regularization method of the form 0 the function JC( r) := 
Jfkez 2 ex P(d r ' k) Fourier coefficients 

V k := W k K k (9) 

and values in M. 3NzX3Nz is called the averaging kernel of the method. (Often only specific 
rows of W and K corresponding to a specific depth Zj and a Cartesian component (3 are 
considered. We will denote them by W[f3,Zj] :](r) and K.[/3,Zj ; :](r)J 
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Note from ^ that the expectation E[u] and hence the bias E[u] — v of the estimator 
v is characterized by a convolution with the averaging kernel: 

E[n] = K. * v. 

To keep the bias small the diagonal entries (a = ft) of the averaging kernel ]C a ’ Zj, ^ ,Zl ( r) 
should be well concentrated around zi ~ z 3 and r = 0 . The off-diagonal entries (ft ft a) 
measure the leakage from one Cartesian component ft to another component a and 
should be small. 

The SOLA (Subtractive OLA) methods aims at finding rows of a weighting kernel 
W indexed by ft, Zj such that the corresponding rows of the averaging kernel /C are as 
close as possible to rows of a prescribed target function T( r) G ^3N z x3N z w pp c keeping 
the noise (last term in (|8j)) small. This can be achieved by setting 

W° LA [ft,z r ,:] := a rgmm WeClxNa [\\WK k - %[ft, zy, :]|| 2 + fiWA^W*] ( 10 ) 

(see [23J ) where ji > 0 is a trade-off parameter. Other objective functional can be 
chosen, see e.g. [32]. The target function ft' /3,Zj ’ a ’ Zl ( r) for a = ft is generally chosen as a 
Gaussian in (r, zft around the point (0 ,zf). For a ft ft it is chosen as 0 . Obviously, the 
convex quadratic minimization problem © can be solved by solving the linear first 
order optimality conditions. We also mention the MOLA (Multiplicative OLA) [TOJ 
method which uses a product KT instead of the difference. 

These methods involve the target functions T and the parameter // as parameters, 
the choices of which are not obvious and involve certain subjectivity. In the next section, 
we propose to use the Pinsker estimator that is optimal in the sense that it minimizes 
the risk in a given class of functions. 


4. Pinsker estimator 


The problem described in Section [2] can be formulated as a linear operator equation 


r = Kv + n. 


( 11 ) 


in the Hilbert spaces X = L 2 ([—it, tt] 2 ) 3Nz and Y = L 2 ([—it, ir] 2 ) Na with a compact, 
linear operator K : X —>■ Y given by a matrix of convolution operators. 

We assume that the noise n is a Hilbert space process in Y with zero mean value 
and known covariance operator Cov [n]. The modelling errors that are ignored in the 
assumption E[n] =0 and references for Cov[n] have been discussed in Section [2j 

An estimator is an operator W : Y —>■ X that maps observations r to an 
approximation Wt G X of v. The risk (or expected square error) of an estimator 
W at v is defined by 

R(W,v) = E[\\W(Kv + n)-v\\ 2 ]. (12) 


If W is linear, the risk can be decomposed into a bias and a variance part using E[n] = 0: 

R(W,v) = \\(WK - I)vf + E[\\Wn\\ 2 }. (13) 


2 
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The bias || (TT^TT — I)v\\ 2 describes how far W is from the inverse of the forward operator 
while the variance term E[||fTn|| 2 ] = trace (Cov [ITn]) = trace (IT* Cov [n] W) describes 
the stochastic part of the error. 

The maximal risk of an estimator W on a set © C X is defined as 

R(W, 0) = sup R(W, v) = supE [\\W(Kv + n) - v\\ 2 ] . (14) 

tE© v£@ 

The minimax risk and the minimax linear risk on 0 are obtained by taking the infimum 
over all estimators (or all linear estimator, resp.) of © 

R N (Q) = inf R(W, 0), R l (S)= inf R(W,@). (15) 

W W linear 

A linear estimator W that attains the infimum in © is called a minimax linear 
estimator. To construct such an estimator for © we first perform a whitening by 
multiplying ( [Tl] ) from the left by Cov[n] -1 / 2 to obtain 

f = Kv + h (16) 

where f := Cov[n] -1 / 2 r and h := Cov[n] -1 / 2 n is now a white noise process, i.e. 
Cov [ft] = J Y . To ensure that K := Cov[n] _1//2 AT is well defined, we assume that Cov[n] 
is strictly positive definite, i.e. every linear functional of r contains a minimal fixed 
amount of noise. Although this assumption could be relaxed, it is simple and intuitive, 
and also guarantees compactness of K. Hence, K admits a singular value decomposition 
{(op ipi, ipi) : l e N}. This allows us to rewrite the operator equation © as a diagonal 
operator equation in sequence spaces given by 


Vi = Wi + ni (17) 

with observables yi := (Cov[n] -1 / 2 r, ^) Y and unknowns vi := (v,ipi)x- Due to 
Gaussianity the noise (rq)/ e ^ is a sequence of uncorrelated Af(0, 1) random variables. 
Let us consider linear diagonal estimators of the form 


A 

vi '■= —Vh 

0i 


W\T := y2 — (Cov[n] 1/2 t,^/)yAz 


le N 


01 


with weights A; e M. The risk R(X,v) := R(W\,v) of such estimators is given by 

i 2 n 


(1 - A ifvI + 


r(\,v) = J2 

le N L 

We will consider ellipsoids of the form 

( OO 

Q:=\veX: J^ a i v f<Q 


a 


i J 


i=i 


with Q > 0 and ai > 0. Then the risk R( A, 0) := R{W \, 0) is given by 


R{ A, 0) = Qsup 


(i -v 5 


le N 0 


E 


A? 


- 

1=1 1 


(18) 


(19) 


( 20 ) 


( 21 ) 


Lemma 4.1. Any minimax linear estimator must be of the diagonal form (18). 
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Proof. Note that since o/ —> oo, the supremum in (21) is attained at some index l\ E N, 


and R( A, 0) = R( A, {wa}) with v\ = (\fQ/ai x )cpi x E 0. If a linear estimator IF with a 
nondiagonal (infinite) matrix representation is replaced by its diagonal part diag(IU), 
the bias part of R(W, (t'diagfM 7 )}) cannot increase and the variance part strictly decreases. 
Hence, 

f?(diag(IU),0) = J R(diag(IH),{n diagW }) < R(W, {v diag{w) }) < R(W,Q ), 
which shows that W is not minimax. □ 

Even though the following result is well-known, we would like to present a short 
proof since we consider it more instructive and simpler than other proofs, e.g. in 
0 20, 3f5j (all for the equivalent regression problems version of the theorem). In 


particular, we derive the formulas (22) and (23) and not just verify them, and it becomes 


apparent that K\fQ is the bound on the bias. 

Theorem 4.2 (Pinsker estimator). Consider a sequence ( ai)i e ^ such that ai > 0 and 


lim^oo o/ = oo, and an ellipsoid of the form (20) with Q > 0. Then there exists a 


unique minimax linear estimator on 0. It is of the form (18), and its weights are given 
by 

Xi = max(l — Rai, 0), (22) 

where the constant k > 0 is the unique solution of the equation 

OO 

~^j 2 max(l — nai, 0) = 0. 


kQ 


i=i a i 


(23) 


The minimax linear risk is given by R L (Q) = yz max(l — koi , 0). 


Proof. The infimum of R(X. 0) over all sequences A can be reduced to the set A := {A E 
l 2 (N ) : HAlloo < 1} since R( A, 0) = oo if A ^ / 2 (N) and R( A, 0) strictly decreases if 
some Xj f [—1,1] is replaced by its metric projection onto [—1,1], Let us introduce the 
decomposition 


A= 1 J A* 

0<K<l/a 


with 


A, := 


A E A : sup 

jeN 


11 Ai I 


= K, 


with a := mkij aj. In view of (21) we have R( A, 0) = k 2 Q + 'Yl^Li^i/ a i) 2 f° r ^ £ A re , 
so the infimum over A E A K is attained if and only if A i = argmin| 1 _ a .| <Ka; x 2 = 
max(l — Kdi, 0) for all l E N. Note that this is (22) if k = k. Using this formula 
for the minimizer we find that 

max(l — Kai, 0) 2 


inf R( A, 0) = ip(n) with 
aga k 


<p(k) := k 2 Q + 


i=i 


(77 


Therefore infA G A-R(A, 0) = inf 0<ft <i/a <^(k). Note that ip is strictly convex and 


differentiable with p'{n) given by the left hand side of (23) since the sum is finite 
in a neighborhood of any n > 0. Moreover, lim K \ j o = oo and Lp'{l/a) = Q/a > 0. 
Therefore, p attains its infimum on (0,1/a] at the unique solution R to (p\n) = 0. □ 
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Instead of the implicit equation (|23j) for n there is also the following explicit formula 
if the sequence (a/) is non-decreasing (see [H6j): 


k = 


E N 

3=1 


n 


Q + Ef =1 £ 


with 


1 


N := max (neN: —^ai(a n — af) < Q 

i=i Gl 


From a practical point of view, this formula is only useful if Q is known exactly. But 
this is a rather unrealistic assumption. Q should rather be seen as a regularization 


parameter. But since there is a one-to-one correspondence between Q and k via (23), it 


is much simpler to consider k as regularization parameter. The choice of regularization 
parameters is an important and well-studied problem, but since the focus of this paper 
is on the comparison of regularization methods, we do not further discuss it here. 

A comparison of the linear minimax risk R L with the nonlinear one was given in 
[3fi] . Under the additional assumptions that the noise is Gaussian and that 


jeN sup^jr a) 


sup 


< oo, 


(24) 


then i? L (0) ~ R N (Q) as the noise level tends to 0. Assumption (24) was later relaxed to 


sup ? <7j/<jj + 1 < oo [20j. This assumption is very plausible in the context of our problem. 

It remains to discuss the choice of the ellipsoid 0. Without depth inversion, i.e. for 
N z — 1 and a scalar physical quantity, it is natural to define 0 in terms of some bound 
on the power spectrum of the form 

T( k ) I V k | 2 < <2- 

kez 2 

E.g. for the choice y(k) = (1 + |k| 2 ) s the ellipsoids 0 are balls in the periodic Sobolev 
H s ([— 7T, 7r] 2 ). In depth direction admissible choices of 0 are more difficult to interpret 
since the axes of the ellipsoid must coincide with the singular vectors of the forward 
operators. 

We choose the weights a/ such that a 2 grows asymptotically as the weights 
7 (k) = 7 (k(Z)) on the L 2 -Fourier coefficients defining an i/ 1 (U)-ball in a cuboid kcR 3 
as l —> oo, i.e., 


ai 


= I 1 '*. 


(25) 


Here k (l) denotes an enumeration of the three-dimensional spatial frequencies such that 
|k(Z)| is non-decreasing. Empirically, we observe that the singular values cq of our 
forward operator decay exponentially, i.e. cq ~ exp(— a — (31) for some a G E and /? > 0, 
and their ordering at least roughly corresponds to the ordering described by k(/) (see 
Figure |3|. 


5. Mass conservation constraint 

In this section we discuss how the mass conservation constraint div (pv) = 0 mentioned 
in 0 can be incorporated into Tikhonov regularization and the Pinsker method. We 
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*,( a' 1/2 k) 





Figure 3. Left panel: Singular values of the forward operator with whitening. Note 
the exponential decay of the singular values. Middle and right panel: Distribution of 
singular values in frequency space. The color at each spatial frequency k = (h x ,h y ) 
represents the number of singular values > 0.071 and >1.5 (with o\ = 41.6). These 
thresholds correspond to positive Pinsker weights and Pinsker weights \\ > 0.5, 
respectively, in Figures [4] and [6] 


will assume that 0 < p m i n < p < p max < oo, that p is smooth and depends only on z. 
Then the inverse problem can be formulated as 

Kw — r subject to div (pv) = 0. (26) 


In an abstract Hilbert space setting an equality constraint Bv — 0 with a bounded 
linear operator B : X 7L does not change much since we can simply replace X by 
the null-space Af(B) of B. However, as it is often inconvenient to explicitly construct a 
basis of Af(B), it is preferable to work in the larger space X. 

E.g. statistical Tikhonov regularization with noise covariance operator A and a 
(differential) operator L mapping to a Hilbert space V applied to (26) reads 

v« = argmin Bv=0 [||A“ 1/2 Xv - t||y + a||Lv||v] • 

To treat the side condition we consider the corresponding Lagrange function £(v, /i) := 
||A -1 / 2 iLv — t||y + a||Lv||y + (/j,,aBv)% with a Lagrange multiplier fi e Z. Here B 
has been multiplied by the regularization parameter a to improve the condition number 


of the optimality conditions = 0 and = 0. These then lead to the saddle point 
equation 



aL*L 



K* A 


-l 


r 


A 


5.1. Fully continuous setting 

In this subsection we discuss a continuous treatment of the depth variable z. If 
V = [—7r, 7r] 2 x [zn z , zq\ is the domain of interest, we may choose 

X = {v G H\V) 3 : v(-;z) periodic, v z (-,z 0 ) = v z (-,z Nz ) = 0} . 

This choice of boundary conditions rules out coronal mass ejections, which are very 
simple to detect and for which the Born approximation used in the derivation of the 
forward operator breaks down anyways. 
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We equip X with the norm ||v||x := (pv, pv)^ 2 where 

( V ' 2 

{pv,pw) H i := ^2 (P v ^ P W>S )l^(v) + (grad pv 13 , gradpu/^qqs 1 

\/3e{x,y,z) J 

Under our assumptions on p the norms ||pv||#i and ||v||/ji are equivalent, but since p 
varies over several orders of magnitude, the incorporation of p in the norm makes a 
significant difference. 

Let us introduce the operators grad p v := grad(pv), curl p v := curl(pv), and 
diVpV := div(pv). The following lemma summarizes the properties of the subspace 
A f( divp) C X and will be proved in an appendix. 

Lemma 5.1. (i) For all v, w G X we have 

^2 (grad^, grad p w /3 ) L 2 (y)3 = (curl p v, curl p w) i2(v)3 +( div p v, div p w) i2(v) (27) 

/9e{x,y,z) 

(ii) There exists a constant c > 0 such that the inequalities 



hold true for all v G X with div p v = 0. 

(Hi) X 0 := {v G X : f y pv x d(r,x) = f v pv y d(r,x) = 0} has the Helmholtz decomposition 
X 0 = A/’o( div p ) © J\f 0 ( curl p ) 

with A//(div p ) := {v G X 0 : div p v = 0} and A4(curl p ) := {v G X 0 : curl p v = 0}. 
These subspaces are orthogonal both with respect to the X inner product and the 
inner product (pv, pw) L 2 t V y 

We will choose Lv := curl(pv). This means we do not incorporate the means of 
the horizontal velicity components into the penalty term, which are needed to obtain 
a norm on {v G X : div p v = 0}. This is justified as the data are sensitive to constant 
horizontal velocities, i.e. K restricted to span{(l/p, 0, 0), (0, 1/p, 0)} is bounded from 
below (see [T3, §8.2]). 

5.2. (Semi-) discrete approximation 

In this subsection we discuss a discrete approximation of the z- variable which inherits the 
essential properties of the continuous setting. We found this crucial for good numerical 
results. Since p depends only on z. the constraint div (pv) = 0 separates into 

ik x pv£ + ikypvl + *^2 = k = (k x , k y ) G Z 2 . 

Hence the only difference between a continuous and a discrete treatment of the (periodic) 
horizontal variables x and y is that in the former case infinitely many spatial frequencies 
must be considered, and in the latter case only finitely many. 
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It will be essential to use different grids for the horizontal and the vertical velocities 
to preserve the most important properties of the continuous setting as summarized in 
Lemma 5.1 in the discrete setting. For a given grid zq > z\ > ... > z^ z in vertical 
direction we introduce the midpoints Zj + 1 / 2 := \(zj + Zj + 1 ). The horizontal velocities will 
be represented on {^ 1 / 2 , • • •, ^- 1 / 2 } whereas the vertical velocities will be represented 
by their values on {zi ,..., z n - 1 }. Here the points zo and zn z have been omitted due to 
the Dirichlet boundary conditions for v z such that 

v^vleW :=C Nz , view :=C Nz ~K 


These quantities will be indexed by vl = (if u k ,y ) T and v k = 

(3 = x,y. To define inner products on V and W we introduce 
weights 5 j := Zj- 1/2 - Zj+1/2 for j = 1,...,N Z - 1 and S j+1 / 2 := Zj - z J+1 for 
j = 0,. .., N z — 1. Then we introduce Gram matrices 


G y := diag(hi /2 ,..., S Nz - 1/2 ) and G w := diag(5i,..., 5 Nz _ 1 ) (29) 


defining inner products (vi,v 2 )v '■= v^Gyv 1 on V and {vi,v 2 )yy : = v^GyyVi on W. 
Similarly, we define pj := p(zf) for j G {0,1/2,1,..., N z } and the matrices M Y = 
diag(pi/ 2 ,... Pn z -i/ 2 ) and My = diag(pi,... Pn z -i)- We approximate derivatives by 
the finite differences 


dz 


(zj+1/2) 




- v\ 


k,j+l 


Sj+1/2 


= (D\ 


vl) 


k Jjl 


<9u k \ 


V\ 


P 


kj—1/2 V k,j+l /2 


P 


= 


for p = x,y with Df G C N ^ N ^ = L( W,V) and D Y G = L(Y,W) given 

by 


- 1 \ 



/i _1 



1 -1 

7~)V f i-\ 

l 1 


and D™ := Gf l 

1 -1 

^ 1 J 


These matrices are skew-adjoint with respect to the inner products in V and W since 
GyDf = -(G w Dj)*, and hence 

(DYw,v)y = V*GyD™W = —V*(GyyD Y )*W = —(Djv)*GyyW = ~(w,D Y v) w- (30) 


Now we introduce the following approximations to the div, grad, curl, and curl* for 


the spatial frequency k 

G Z 2 : 




divk := ( 

ikJ Y 

ik y I Y 

Df): 

VxVxWgV, 

grad k := 

( ikJ Y 

ik y I Y 

(D Y ) T ) T :YgVxVxW 


( 0 

-D] 

ikyl W \ 


curl k := 

DJ 

0 

~ikJ W 

TxVxWgWxWxV, 



ik x I Y 

0 j 
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curl k := 




0 

7-)W 

-ikyp 


_ D v 

0 

%KP 


ikyP 

0 


\ 


J 


:WxWx¥—>-¥x¥xW. 


Let us introduce the spaces Xk := ¥ x ¥ x W and Yk := W x W x ¥, the multiplication 
operator Mf := blockdiag^Mj, Mj, and the mappings 


div Pjk := di\kMj, 

curl M : = (M*) _ 1 curlf, 
The Gram matrices in X and Y are 


curlp k := curl k Mf, 

grad p k := (Mf)- 1 grad k . 


(31) 


Gx '■— ( Mf ) 2 biockdiag( Gy , Gy, Gyy) and Gy ■— blockdiag(Gyy, Gyy, Gy).(32) 
These matrices have the following properties: 

Lemma 5.2. (i) div Pj kCurlf k = divk curlk = 0 and curl p k grad p k = curlk grad k = 0. 

(n) M(DY) = { 0 } 

(in) curl^ k is the adjoint of curl P; k with respect to the Gram matrices G% and Gy, i.e. 

Gxcurl^ k = (Gy curl P; k)*, and similarly Gydiv Pj k = — (Gx grad pk )*- 
(iv) With respect to the Gram matrix G\ we have the orthogonal decomposition 

X k = A(( div Pjk ) © Af( curl Pjk ) and Af( div Pjk ) = Tl( curl^ k ) for k ^ 0 . (33) 


(34) 


Proof. Part (|ij) can be verified by straightforward computations. 

Part (jn]) is also easy to see, and part (iii) follows from (30). 

To show part (|Iv|) we first demonstrate that 

Af ( curl P;k ) njV(div Pik ) = {0} 

Let Vk = (u k , u k , n k ) G A/"(curl Pi k) fl Af( div Pi k)- We only treat the case Ay 0 as the 
case k y ^ 0 is analogous. The last line in curl p kVk = 0 implies that 

(35) 

(36) 


KMfvl = kyMfvl. 


Together with the relation div p kVk = 0 this yields 

iKD^Mjvl = klMfvl + k x k y Mjvl = |k| 2 M> k . 


From the second line in curl Pj kVk = 0 we obtain iA: x M(f¥ k = Df so 

D^DfMfvl = |k| 2 MV. 


Together with (30) we find that (Df*G^Df + (A^Gy 1 ) Mju k = 0. Since the matrix on 
the left hand side is strictly positive definite, it follows that u k = 0. Now it follows from 
part ([II]) , ( p5| ), (36) and Ay 7 ^ 0 that u k = 0 and n k = 0 , completing the proof of (34). 
From parts (jlj) and (iii) we obtain 


Af( curlpk )- 1 = K( curl# k ) C Af( div Pjk ) 

with orthogonality with respect to the inner product generated by Gx- Together with 
([34]) this implies (33]) □ 
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Remark 5.3. Let us discuss the case k = 0. We claim that in analogy to the continuous 
situation we have 


Af( curl P;0 ) Pi A/"( div Pj0 ) = {(c x (Mj) 1 e,c x (Mj) 1 e, 0) : c^Cy G C} (37) 

where e G V is the vector with all entries equal to 1. In fact, for v 0 G J\f{ curl p0 ) H 
Af( div Pi o) it follows follows from pa rt |m[ ) and div Pi o v o = 0 that Vq = 0. Note that 
Af(Df) = {c(Mj) _1 e : c G C}. Now \3ty follows from curl p ov 0 = 0. 

The projection matrices onto Af( curl p k) and A f( div Pj k) can be computed using a 
QR-decomposition of G][ 2 curl]f k : 

G T curl * k = (Q k g k ) ( f ) , Pk := G^QtQtGj/ 2 , k ^ 0.(38) 

Here Pk has full row rank p, [Qk Ok] is unitary, and Qk has p columns. We summarize 
the properties of P k : 

Lemma 5.4. Let k ^ 0 . Then P k is a projection onto A/"(div Pj k) (Pe. P k = Pk and 
P(P k ) = Af( div Pi k) ), and I — Pk is a projection onto Af( curl p> k). Pk is orthogonal both 
with respect to the inner product induced by G\ (i.e. P k G% = G xPk) and the semi-definit 
inner product induced by the (Hermitian) Gram matrix 

G k? P = Gx curl# k G Y curlp k - G x grad p k Gy div p , k 

(i-e- Pk G C = G#Pl k ). 


Proof. The identity P k Gj = G-J 2 Q k Q* k G-J 2 = G-/Pk is obvious from the definition. We 
have P k = G'“ 1 / 2 g k (0k0k)0kG'x 2 = Pk> so Pk is a projection, which implies that I — Pk 
is a projection as well. Using Lemma 5.2[ parts (|Ifi]) and (|IJ) we obtain 

77(P k ) = 77(G- 1/2 g k ) = P(curl# k ) = W(div p , k ). 

Moreover, 77(/ - P k ) = P(Pk) x = Af{ div p , k ) ± = Af( curl# k ) using ffl) and the 


self-adjointness of Pk in X. By Lemma 5.2 (iii) we have G kp = curl* k GY curl p k + 
div* k Cv div p k, so G kp is Hermitian and positive semi-definite. Moreover, since 
div Pj kPk = 0 we have 

PiGC = (p;Gi /2 ) (Gfcurl#,) G y curl,, t = (oj/ 2 Q k Q‘ k ) (Q k R k ) Gy curl,* 

= G!/ 2 (Q k R k ) Gy curl,,k = Gx curl* k Gy curl p , k = curl" k Gy cur] p , k . 

Since the right hand side of this equation is Hermitian, so is the left hand side, which 
implies P k *G^ = G?;P k . □ 

For k = 0 we define Pk as follows: 

/ Gl /2 cur 1^ 0 \ 


— (Qo Qo) 



Po ■■= G- 1,2 Q„QZGP, (39) 


1/2 


((Mj)-'e,0,0) 

\ (0,(Mj)-'e,0) J 

see Remark |5.3[ In this case P 0 is the orthogonal projection onto Af( div p , 0 ) 
{(c x (Mj)“ 1 e,c x (M ( J)“ 1 e,0) : c^Cy G C}. 
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Let us recall of the definition of the Generalized Singular Value Decomposition GSVD 
(see [HE]): Let A G M TO,n and L G M 9,n be matrices with m > n and rank(L) = p. Then 
there exist unitary matrices U G M. m ’ m and V G R 9,9 and an invertible matrix X G W ,n 
such that 


A = USX- 1 and L = VCX~ x 


(40) 


where S = diag(si, s n ) G R mXn and C = diag(ci,c m i n ( 9in )) G R 9Xn with 
1 > ci > ... > c p > c p .|_i = ... = 0. The generalized singular values a t of {A, L ) are 
<Tj = Si/ci for i = 1,... ,p, and the generalized right singular vectors of (A, L) are the first 
p columns x\,... ,x p of X. They satisfy the orthogonality relations x*L*Lxk = (?jbj,k 
for j, k G {1,..., n}. If L = I then the GSVD and the SVD coincide (except for that 
the ordering of the singular values). 


We will set A := A k K^I\ with P k defined in (|38|) and L := 


Gif 2 curl P;k 

Gif 2 divp.k 


. For 


k ^ 0 this yields vectors xi,..., £dim(x k ), Vi,..., V dim {x k ), and u\,... ,u p and numbers 
Si,..., s p , ci,..., c p > 0 such that 


A k 1/2 K k P k Xj = sjUj, j = 1 ,... ,p 

Lxj = CjVj, j = 1..., dim(Xk). 


For j < p we have 


Xj e W(A- 1/2 W k P k ) x c W(P k ) ± = W( div Pik ) 
with orthogonality w.r.t. the L-induced inner product and x*G 


h 1 


k ,P X J 


= \\ Lx if = 


"3 ' 


Therefore Xj/cj are the singular vectors of A w.r.t. this inner product, and the k-th 
Fourier coefficient of the Pinsker estimator is 


WkT k = £ 

3 = 1 


max(l — Kdj, 0) 


On 


(«i> A k 1/2r k)^ = J2 

j 3=1 


max(l — Kaj, 0) 


(%> A k 1/2 T k )£j. 


6 . Numerical results 

In the following we will compare RLS, SOLA and Pinsker methods for recovering three- 
dimensional velocity fields from travel time measurements on the solar surface. 

To compare the different inversion methods on synthetic data, we use the velocity 
model presented in m which reproduces an average supergranule. Supergranulation is 
a convection pattern with an average life time of about 1 day and a characteristic length 
of around 30 Mm that is observed at the surface of the Sun. A representation of the 
velocity field v z and v x is given in Figures |4] and [6] (top rows). This velocity is built such 
that mass is conserved, which explains the decrease of the amplitude with depth due to 
the strong density gradient. These velocities are then convolved with the kernels, and 
noise is added according to ([!]) in order to obtain travel time maps as shown in Figure [2| 
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Data: • kernels K k G C MxNa and noise covariance matrices Ak G C NaXNa for 
all frequencies k; 

• regularization parameter k; 

Result: linear estimator Il k for all frequencies k 


set up Gram matrices Gy,G^,Gx, and Gy (eqs. (29), (32)); 
for kG [-N k /2,...,N k /2-l] 2 do 


set up matrices div P;k , curl p k and P k (eqs. (31), (38), (39)) ; 


[Pk, Ak, 14, s k , Ck] = gsvd I A u 1 ^ 2 K k P k , 


G l f 2 curl P;k 
G l J 2 div pk 


(Pk, AfcU, 14 are matrices with columns u k j , x k j , v k j ; 

Sk, Ck are vectors with entries s k j, c k j); 

end 

Find bijective ordering l : ■ ■ ■, — l] 2 x {1,..., M} —>■ N%N Z such that 

^ if l(k,j) < c k q,cy > 0. l(k,j) > l(k,j) if c k:j = 0 and c k j>0; 

for k G f—iVfc/2,..., Afc/2 — l] 2 do 
p k = max{j : c kJ > 0}; 
for j = 1,... ,pk do 
a kJ := Z(k, j) 1/3 ; 

Ak,j := max(l — Ka k j,0) ; 
end 

lFk = Ak[:,l:pk]diag(^, 


A k 


Pk 


s k,p k 


P k [:,l:p k ]*A : 


-1/2 . 

k > 


end 


Algorithm 1: Pinsker algorithm with mass conservation constraint. 


6.1. Reconstruction without mass conservation 


In the RLS method we have chosen the regularization term as H 1 norm in horizontal and 
vertical directions, and in the Pinsker method the ellipsoid 0 was chosen according to 
(25)) to approximate a ball in the Sobolev space H l (V). The regularization parameters 
a and k have been chosen by the discrepancy principle. Although the discrepancy 
principle performs poorly for high dimensional white noise (and is not even well- 
defined in the infinite-dimensional case), here the noise is sufficiently correlated for 
the discrepancy principle to work reasonably well. The SOLA weighting kernels are 
obtained by minimizing (10) with a target function 


V 




r ) = exp 


2 s 2 
Lb h 


+ 


\Zj - Zi\ 

24 




where Sh and s v determine the localization of the averaging kernels in the horizontal and 
vertical directions. As usual we added a constraint for k = 0 via Lagrange multipliers to 
ensure that the integrals over the averaging kernels for horizontal velocities are 1. This 
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is not possible for the vertical velocities since constant vertical flows are in the nullspace 
of K. To allow a fair comparison with RLS and Pinsker, we did not impose a strong 
additional penalty to suppress cross-talk as in [32j since we found that this induces a 
significant loss of resolution. 

It is well-known in helioseismology that RLS and SOLA can reconstruct horizontal 
velocity components v x ,v y fairly well, but perform poorly for the reconstruction 
of vertical velocity components. Figure [4] shows the reconstruction of v z for the 
different methods without mass conservation. As expected, the results for Tikhonov 
regularization are poor except close to the surface. The SOLA method is a bit better at 
larger depths and the Pinsker estimator leads to a clear improvement with almost correct 
reconstructions at z t = —3.5 Mm and a detection of a positive value of the velocity close 
to the center at z t = —5.5 Mm. However, the amplitudes of the reconstructed velocities 
both at —3.5 Mm and in particular at —5.5 Mm are too small. 

To understand the difficulties of RLS with the reconstruction of v z , we look at the 
depth localisation of the averaging kernels. To compare the different estimators W 13 for 
some velocity component A 3 , (3 E {x, y, z}, we choose the parameters in these methods 
such that the variance E[|(IF /3 n)(r, z t ) | 2 ] at the target depth z t has the same value for all 
the methods. (Due to translation invariance of the noise covariance structure this value 
is independent of r.). Then we compare the corresponding averaging kernels /CA 2t; “’ Zj ' 
describing the bias (see Definition 3.1). In Figure |5j we represented the horizontal L 2 
norm of }C z ’ Zt ’ z ’ z i as a function of the depth z 3 for RLS, SOLA and Pinsker methods 
at two different target depths z t . One can see that the averaging kernel for the RLS 
method is mostly localized close to the surface rather than at the target depth. In 
contrast, the averaging kernel of Pinsker is much better localized at z t , but still exhibits 
some sensitivity to the values close to the surface. Intermediately, the SOLA averaging 
kernel is localized at the correct depth, but is extremely broad, so the reconstruction of 
v z at the target depth z t is greatly influenced by the other depths. 

The reconstructions of the horizontal velocity v x by Tikhonov, SOLA and Pinsker 
methods are shown in Figure [ 6 } As expected, all methods perform well. Surprisingly, 
from visual inspection Pinsker seems slightly less accurate than Tikhonov regularization 
at z t = —5.5 Mm. 


To get a better insight into the reconstructions, we can again look at the 
averaging kernels with the same choice of parameters as described above. Figure [7] 
shows JC x,Zt;x,z j(x,0) as a function of z 3 and x for three different depths z t E 
{—0.9 Mm, —3.5 Mm, —5.5 Mm} for the RLS, SOLA and Pinsker estimators. 

The differences between the three methods are the more pronounced the greater 
the target depth z t , i.e. the greater the ill-posedness. The Pinsker averaging kernels 
turn out to be the most localized, in particular in z direction while the SOLA averaging 
kernels are the least localized. RLS and Pinsker produce similar averaging kernels for 
the v x estimators, which is consistent with the observed reconstructions. However, it is 
surprising that the reconstruction with the Pinsker method is not the best at —5.5 Mm as 
the averaging kernels are the most localized. To explain this apparent inconsistency, we 
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Figure 4. Vertical velocities v z (x, y,z t ) in m/s of a supergranule model from Ref. 
[II] (top) and their reconstructions with RLS (2 n ^ row), SOLA (3 r< ^ row) and Pinsker 
(bottom) at three different depths z t S {—0.9Mm,—3.5Mm,—5.5 Mm}. The circles 
at radii 10 Mm and 20 Mm represent zero level lines of the exact solution. 


need to look at the cross-talk, i.e. how v y and v z influence the estimator of v x . Figure [8] 
shows the averaging kernels KP ,Zi ' ,y,Zj and /C x,2t;z,2) at a target depth of z t = —3.5 Mm. 
The cross talk is rather strong for Pinsker where the maximum value of the off-diagonal 
averaging kernels is only 50% smaller than the maximum JC x,Zt ’ x,z p as opposed to around 
10% for RLS and 5% for SOLA. 

6.2. Incorporation of the divergence constraint 

Figure [9] shows the reconstruction of the vertical component of the velocity for the 
Tikhonov and Pinsker methods with mass conservation constraint. It underlines the 
importance of incorporating the constraint into the inversion process. The vertical 
velocity is now properly reconstructed by both methods. 
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z k [Mm] z R [Mm] 


Figure 5. Horizontal norm of the averaging kernels /C z,2;t;z,2:/e as a function of 
Zk for target depth z t G {—3.5 Mm,—5.5 Mm} for the estimators RLS, SOLA and 
Pinsker. In each panel the regularization parameters are chosen such that for all three 
estimators W z of the vertical velocity the standard deviation at the target depth z t is 
(Var[(kP z r)(r, z t )]) 1 ' 2 = 3m/s for all r for noise corresponding to an averaging time of 
4 days. 


To better compare all the methods, Figure [lO] represents a cut of the vertical velocity 
at x = 0 and z t e {—3.5 Mm, —5.5 Mm}. Incorporating mass conservation into Tikhonov 
leads to a quite good reconstruction with an amplitude of about 70% of the true one. 
Finally, Pinsker with mass conservation is almost perfect at the depths up to —5.5 Mm 
with correct shape and amplitude. 

Finally, we also study averaging kernels. Note that in the case of divergence 
constraints we have to think again about the definition of such kernels as 5-peaks are 
not divergence free. We redefine the Fourier coefficients of the averaging kernel as 

Vmiv = {I- Pk) + PklC k P k (41) 

where Pk denotes the L 2 -orthogonal projection on the nullspace of div p . This type 
of kernel still characterizes the bias of regularization methods if they are applied 
to solutions satisfying the mass conservation constraint and if Pk is applied as a 
postprocessing step. Note that this definition implies that the Fourier coefficients of 
the averaging kernel are non-zero even at high frequencies due to the identity term. 
Thus, these averaging kernels cannot be directly compared to the ones of Section [3] 
(and thus to the ones classically used in helioseismology), but their definition using © 
is natural as the convolution of ICdw with the velocities characterizes the bias of the 
method. 

In Figure [Tl] we plot at each voxel the Frobenius norm of the 3x3 matrix of 
averaging kernels of Pinsker with divergence constraints, i.e. the Euclidean norm of all 
the 9 averaging kernels at this voxel. Note that these kernels are very well localized even 
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Figure 6. Horizontal velocities v x (x,y,z t ) in m/s of a supergranule model from Ref. 
HB (top) and their reconstruction with different methods: RLS, SOLA and Pinsker 
(from top to bottom) at three different depths z t = —0.9 Mm (left), —3.5 Mm (middle), 
and —5.5 Mm (right). The circles at 20Mm and 30Mm indicate the zero level lines of 
the exact horizontal velocity component. 


in direction and at —5.5 Mm. This explains the significant improvement in estimating 
vertical velocities achieved by the incorporation of the mass conservation constraint. 
We can even achieve a reasonable resolution in 3 -direction, which is not needed for 
reconstructing the supergranule model used as test example. 

7. Conclusions 

We have shown that Pinsker estimators yield significantly better reconstructions 
of vertical velocities from travel time maps than Tikhonov regularization, and is 
also superior to Subtractive Optimally Localized Averaging. This is consistent 
with theoretical optimality properties of these estimators. However, as soon as 
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Figure 7. Averaging kernels JC K,Zt ' ,x,Zj (x, 0) (characterizing the v* influence on 
the bias of the v x estimators) as functions of x and Zj for target depths z t E 
{—0.9 Mm,—3.5 Mm,—5.5 Mm} for the estimators SOLA, RLS, and Pinsker. The 
variances of these estimators for each target depth are chosen to be of the same size. 


depth inversion is involved, no simple, precise characterization of the ellipsoids on 
which Pinsker method is optimal is available. This is the usual situation for all 
spectral regularization methods such as Tikhonov regularization, Showalter’s method, 
Landweber iteration, and many others in a deterministic context. As opposed to many 
other real-world problems, Pinsker estimators are computationally efficient and easy to 
implement in the context of local helioseismology. 

The mass conservation constraint can be incorporated naturally into the Pinsker 
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Figure 8. Cross-talk averaging kernels JC^ Zt]y,Zj and KP ,Zt]Z,Zj for z t = —3.5 Mm for 
the SOLA, Tikhonov, and Pinsker methods. These kernels characterize the influence 
of the variables v y and v z on the bias of the ^-estimators. 


estimator leading to another significant improvement of accuracy and resolution. Under 
realistic noise levels this yields reliable estimators of vertical velocity components up to 
a depth of —5.5 Mm using travel times from / and pi to P 4 modes. 

Alternatively, one may study an adaptive, data-driven choice of the size of the 
ellipsoid in the Pinsker method, which may be interpreted as a regularization parameter. 
We plan to address this as well as the application to real data in future work. 
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Figure 9. Reconstructions for v z as in Figure [4] but by imposing mass conservation 
(top: RLS, bottom: Pinsker). 




Figure 10. Comparison of the different methods to reconstruct the vertical velocity 
v z (x,0,Zt) at z t = —3.5Mm (left) and zt = —5.5Mm (right). The Pinsker method 
with mass conservation provides the best reconstruction. 
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Figure 11. Pointwise Frobenius norm of the 3x3 averaging kernels /Cdiv for 
the Pinsker method with mass conservation at three different depths z t G 
{—0.9 Mm, —3.5 Mm, —5.5 Mm}. 
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Appendix 


This appendix contains the proof of Lemma 5.1 


Proof. We make the substitutions p = pv and q = pw. To prove dll) , note that 


Y (grad/, grad/) L 2 (v)3 = Y 


/?e{x,y,z} 


0 , 7S{x,y,z} 


dph QqP 

d'y d'y 


and 


curlp, curlq) L 2( V) 3 + (divp, divq) L 2 ( y) 

dpP QqP r I" QpP Qqi 

dx+ ^l 


= E 


P,i€{*,y,z} V 1 1 £#7' 

For all terms in the second sum (coming among other terms from the curl part) we 
can perform partial integrations without boundary terms to see that these terms vanish. 
(Note that this would not work without the Dirichlet boundary conditions for the z- 
components, e.g. for (3 = x and 7 = z.) Therefore, the left hand sides of the last two 
equations are equal. 

Part (Jii]): As 

IIpIIhUv ) 3 = 'Yj IIP^IIi 2 (V) + II g ra d/||i2(W)3 

Pe{x,y,z} 


<9q <9q 


dph dq 1 


d/3 d'y d'y d/3 


dx 


the second inequality in (28) follows from (27) and the Cauchy-Schwarz inequality 


/y/d(r, z) < ||/||l 2 |V| 1/2 for /3 e {x,y}. 


To prove the first inequality in (28) it suffices to show that there exists a constant C > 0 
such that 


\\P 


^|| L 2<C||gradp^|| L 2 + C(1 | ^ | ^ ) J P?d(r,z) 


for all j3 e {x, y, z 


and all p e X. For (3 = z this follows from the Poincare inequality due to the Dirichlet 
boundary conditions, and for f3 G {x, y} it is a consequence of the Poincare-Wirtinger 
inequality. 

Part (iii): To show orthogonality w.r.t. the inner product (pv, pw) L2 , v p let 
j). Then by potential theorems (see e.g. [291 Thm. 3.37]) we have 


v G A/"o(cur 

pv = grad / for some / G H l (V). It follows by partial integration that (pv, pw) 


L 2 (V) 
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fyfdiVpwdx = 0 for all w e A/" 0 (div p ) where the boundary terms vanish due to 
the boundary conditions. Hence, v _L A4(div p ) w.r.t. the weighted L 2 inner product. 
Together with ( |27[ ) we also obtain orthogonality w.r.t. the X inner product. 

Let v e X 0 satisfy (pv, pw) i2 (m = 0 for all w e A/" 0 (div p ) and all w e A/^>(curl p ). 
We aim to show that v = 0. Since div p (p _1 curlg) = 0 and curl p (p -1 grad /) = 0 
for all smooth / and g vanishing at the boundaries, we may choose w = p -1 curlg 
or w = p _ 1 grad/ and perform partial integrations to obtain (curl p v, g) i 2 ( y) = 0 an d 
(div p v, /} = 0. Therefore curl p v = 0 and div p v = 0, and so v = 0 from part ([h]). This 
shows that the sum of the nullspaces is dense in L 2 {V). Since the nullspaces are closed 
and orthogonal in X 0 , their sum equals X 0 . □ 
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